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Abstract — The source-coding problem with side information at 
the decoder is studied subject to a constraint that the encoder — 
to whom the side information is unavailable — be able to compute 
the decoder's reconstruction sequence to within some distortion. 

For discrete memoryless sources and finite single-letter distor- 
tion measures, an expression is given for the minimal description 
rate as a function of the joint law of the source and side 
information and of the allowed distortions at the encoder and 
at the decoder. The minimal description rate is also computed 
for a memoryless Gaussian source with squared-error distortion 
measures. 

A solution is also provided to a more general problem where 
there are more than two distortion constraints and each distortion 
function may be a function of three arguments: the source 
symbol, the encoder's reconstruction symbol, and the decoder's 
reconstruction symbol. 



I. Introduction 

LIKE Wyner and Ziv HI, we study a setting where 
a sequence generated by a source is to be described 
succinctly to a reconstructor ("decoder") with access to some 
side information. Wyner and Ziv showed that, although the 
side information is not available at the describing terminal 
("encoder"), it can be beneficial in improving the trade-off 
between the rate of description and the reconstruction distor- 
tion. They fully characterized this trade-off for memoryless 
sources with single-letter distortion measures. Unlike the case 
without side information — since the side information is used 
in the reconstruction process, and since the side information 
is not available at the describing terminal — the describing 
terminal cannot tell how the source sequence it observes 
will be reconstructed. In some settings, this is unacceptable. 
Steinberg [2| therefore studied the common-reconstruction 
problem where an additional restriction is imposed that the 
reconstruction sequence be computable with probability nearly 
one at the describing terminal. This greatly limits the extent by 
which the reconstruction can depend on the side information. 
More generally, there is a tension between the degree by which 
the reconstructing terminal utilizes the side information and 
the precision with which the describing terminal can compute 
the reconstruction sequence. It is this tension that we study in 
this paper. 

The material in this paper was presented in part at the 2011 Information 
Theory and Applications Workshop and at the 2011 IEEE International 
Symposium on Information Theory. 

A. Lapidoth is with the Department of Information Technology and Electri- 
cal Engineering, ETH Zurich, Switzerland, (email: lapidoth@isi.ee.ethz.ch). 
A. Malar was with the Department of Information Technology and Electrical 
Engineering, ETH Zurich, Switzerland. He is now with Malcom AG, Zurich, 
Switzerland (email: andreas@malcom.ch). M. Wigger is with the Communi- 
cations and Electronics Department, Telecom ParisTech, Paris, France (email: 
michele.wigger@telecom-paristech.fr). 

The work of A. Malar was supported by an IDEA League student grant. 
The work of M. Wigger was supported by the "Emergences" grant of the city 
of Paris. 



X' 



encoder 



M 



Y h 



decoder 



XI 



i n 1 n 

-J2^[do(X i , i ,X e , i )] <D e - -J2^[dd(Xi,X d ,i)] <D A 



Fig. 1. Constrained Wyner-Ziv coding. 



To quantify this tension, we require that the describing 
terminal generate an estimate of the sequence that will be 
produced at the reconstructing terminal (Figure [TJ. We then 
study the distortions that can be simultaneously achieved at 
the describing terminal ("the encoder distortion") and at the 
reconstructing terminal ("the decoder distortion") as a function 
of the description rate. If the encoder's distortion function is 
the Hamming distance and if the allowed distortion is zero, 
then our problem reduces in essence to Steinberg's common- 
reconstruction problem^ And if the allowed encoder distortion 
is infinite, our problem reduces to Wyner and Ziv's problem. 
We can thus view our problem as a generalization of the 
Wyner-Ziv problem and Steinberg's common reconstruction 
problem. 

For discrete memoryless sources and finite single-letter 
distortion functions, we provide a single-letter characterization 
of the trade-off between the description rate and the distortions 
at the encoder and decoder sides. We also calculate this 
trade-off for a memoryless Gaussian source and squared-error 
distortion functions. Finally, in Section HVl we generalize the 
results to account for more than two constraints and to allow 
each distortion function to depend on three arguments: the 
source symbol, the encoder's reconstruction symbol, and the 
decoder's reconstruction symbol. 

Steinberg's work was also extended in other ways. Kitti- 
chokechai, Oechtering, and Skoglund [3| determined the rate- 
distortion function under a common-reconstruction constraint 
for a modified Wyner-Ziv setup where the encoder can influ- 
ence the decoder's side information via an action-generator. 
Timo, Grant, and Kramer [|4], and Ahmadi, Tandon, Sime- 
one, and Poor J6), Q derived the rate-distortions function un- 
der a common-reconstruction constraint for two special cases 
of the Heegard-Berger/Kaspi problem (the Wyner-Ziv problem 
with two decoders): (6), (7) for physically degraded side in- 
formations, and 0], (5) for complementary side informations. 
Ahmadi, Tandon, Simeone, and Poor J6), Q also presented 

1 For a precise statement see Remark [5] in Section III-B I ahead. 
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the rates-distortions function under a common-reconstruction 
constraint for a cascade source-coding problem when the 
side informations are physically degraded. Finally, already 
in 0, Steinberg studied the implications of the common- 
reconstruction constraint on the simultaneous transmission of 
data and state and on joint source-channel coding for the 
degraded broadcast channel. 

The paper is organized as follows. In the rest of this section 
we introduce our notation. In Section [TT] we treat discrete 
sources and general distortions, and in Section [III] Gaussian 
sources with quadratic distortions. In Section |IV] we revisit 
discrete sources but this time with more and more general 
distortion constraints. 

A. Notation 

Random variables are denoted by upper-case letters and 
their realizations by lower-case letters. Vectors are denoted 
by bold-face letters: random vectors by upper-case bold- 
face letters, and deterministic vectors by lower-case bold-face 
letters. Sets and events are denoted by calligraphic letters, i.e., 
A. An n-tuple [A\ , . . . , A n ) is denoted A n , and the n-fold 
Cartesian product of the set A is denoted A n . The convex hull 
of a set A is denoted by conv(_4). To indicate that the random 
variables A and C and conditionally independent given B we 
write 

A^>-B^>-C. 

The transpose of a vector a is denoted by a T ; its Euclidean 
norm by ||a||; and the Euclidean inner product between the 
vectors a and b by (a, b). The set of real numbers is denoted 
K and its rf-fold Cartesian product R d . The nonnegative reals 
are denoted R + , and the positive reals M ++ . The respective 
d-fold Cartesean products are denoted E_f. and Rf + . We use 
I(-) to denote the indicator function: I(statement) is equal to 
one if the statement is true and is equal to zero if it is false. 
Throughout the paper log(-) denotes base-2 logarithm, and 
log + (£) = max{log£, 0}. The abbreviation IID stands for 
independently and identically distributed. 

II. Discrete Memoryless Source and General 
Distortions 

A. Problem Statement 

Our setting is illustrated in Figure Q] and is specified by a 
tuple 

(X,y,X,P XY ,d d ,d e ,D d ,D e ), 

where X , y, X are finite sets, Pxy is a probability distribution 
on X x y; dd(-, •) and d e (-, •) are nonnegative functions 

d d :XxX^R+ (1) 
d e :XxX^R+: (2) 

and Da and D e are nonnegative real numbers. 

The sets X, y, and X model the source, side information, 
and reconstruction alphabets. A source sequence X n £ X n is 
observed at the encoder (but not at the decoder) and a side- 
information sequence Y n £ y n at the decoder (but not at the 



encoder). The sequence of pairs {(^Q, is assumed to 

be drawn IID according to the joint law Pxy- 

The encoder describes the source sequence X n to the 
decoder by an index 

M = f {n) {X n ) (3) 

where 

j(n) . X n (4) 

is the encoding function and 

M±{1,...,M}. (5) 

Based on the index M and its side information Y n , the decoder 
forms a reconstruction sequence 

X% = (j) {n) {M,Y n ) (6) 

where 

cj>( n) : Mxy n -+X n (7) 

is the decoder's reconstruction function. The encoder's esti- 
mate of the decoder's reconstruction sequence is 

X^ = ^ n \X n ) (8) 

for some 

4> {n) : X n -> X n . (9) 

The goal of the communication is that the decoder's re- 
construction X^ matches the source sequence X n up to a 
distortion no larger than D& and the encoder's estimate X™ 
matches the decoder's reconstruction X^ up to a distortion 
no larger than D e . The distortions are measured by the 
bounded, nonnegative, single-letter distortion functions d<j(-, •) 
and G? e (-, •). 

We say that a nonnegative triple (R, D e ) is achievable if 
for every e > and sufficiently large n there exists a message 
set of size 

\M\<2 n{R+e) (10) 

and a triple of functions [f( n \ , ^ n ^) as above such that 
the decoder-side reconstruction constraint 

1 - 

- VE^fl.ji,)] <D d + e (11) 

i=i 

and the encoder-side reconstruction constraint 
1 " 

-VE[4fti,I.,i)]<A + e (12) 

4 — 1 

are both met. 

Our problem is not very interesting if the distortion con- 
straints cannot be met even when the source sequence is 
revealed losslessly to the reconstructor. Consequently, we shall 
make the following assumption throughout: 

Assumption 1: The distortion functions d& and d e are such 
that for each x € X there exist std,x e € X satisfying 
dd(x,Xd) = and d e (xd,x e ) = 0. 

As we shall see, this assumption ensures that the triple 
(R, D d , D e ) is achievable whenever R > H(X\Y). 

We are interested in finding the smallest rate R such that a 
given distortion pair Dd, D e is achievable. For given Dd, D e > 
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0, let lZ(D d ,D e ) denote the set of rates R > such that the 
tuple (R,D d ,D e ) is achievable: 

K(D d ,D e ) = {R>0: (R,D d ,D e ) is achievable}. (13) 

Notice that by the assumption above, the set lZ(D d , D e ) 
contains all rates R > H(X\Y) and is thus nonempty. We 
can now define rate-distortions function as 



and 



R(D d , D e ) = min R, 

Ren(D d ,D c ) 



(14) 



where the minimum exists because the set lZ(D d ,D e ) is 
nonempty, closed, and bounded from below by 0. 

B. Related Setups 

Wyner and Ziv's classic lossy source-coding problem with 
side information |fl] is similar to our problem except that 
Wyner and Ziv do not impose the encoder-side reconstruction 
constraint (fT2] i. Informally, our problem thus reduces to the 
Wyner-Ziv problem if we set D e to infinity. Wyner and Ziv's 
result can be summarized as follows: 

Theorem 1 (Wyner and Ziv $J§): The rate-distortion func- 
tion i?wz(-Dd) in the Wyner-Ziv setup is given by 

R WL {D d )=mm(l{X;Z)-I{Y;Z)) (15) 

where (X, Y) ~ Pxy, an d where the minimization is over all 
functions <\> : y x Z — s- X and discrete random variable Z for 
which: Z takes values in an auxiliary alphabet Z of size at 
most \X\ + 1; 

Z^-X^>-Y (16) 

forms a Markov chain; and 

E[di(X,<f>(Y,Z))] <D d . (17) 

Since imposing the encoder-side reconstruction con- 
straint O cannot increase the set of achievable rates, 



R(D d ,D e ) > R, wz (D d ). 



(18) 



Equality holds whenever the encoder-side reconstruction con- 
straint (TTZt does not pinch. For example, when X = X; 
D d = D e ; and 



d e (x, x) — d d (x, x), x,x*EX. 



(19) 



Indeed, in this case the encoder can set X S: i to be Xi. This 
results in (fT~2b being identical to (fTTT i and thus superfluous. 

Steinberg's setup in |2| is obtained from ours by replacing 
the encoder-side distortion constraint (fTZt by the more strin- 
gent perfect-reconstruction constraint 



Pr! e "^ 1 <e 



(20) 



Theorem 2 (Steinberg $2§): The rate-distortion function 
Ra(D d ) in Steinberg's setup is given by 



R a (D d 



mm(l(X;X)-I(Y;X)), 



(21) 



d d {X,X) 



< D d 



(23) 



Remark 3: Constraint (f20b is equivalent to the block- 
distortion constraint 



\{x:^x2) 



< e. 



(24) 



Thus, when in our setup d e (-, •) is the Hamming distortion 
and D e = 0, then Steinberg's setup differs from ours only in 
that (l20t is a block-distortion constraint whereas (TTZt is an 
average-per-symbol distortion constraint. 

C. Results 

To describe the rate-distortions function for the setup of 
Section Hl-Al we introduce the function R(D d ,D e ). The ex- 
pression for R(D d , D e ) in is similar to the expression for 
Rwz{D d ) in (fT3T > except that in the expression for R(D d , D e ) 
we have the additional constraint; see d28l i ahead. 

Given the joint law Pxy of the source and side information, 
and given the distortion functions d d , c? e , this function is 
defined as 

R{D d , D e ) = min (l(X; Z) - I(Y; Z)) (25) 

where the minimization is over all discrete random variables Z 
taking value in some finite auxiliary alphabet Z and forming 
the Markov chain 

Z^>-X^-Y (26) 

and over the functions 4>: y x Z ^> X and ip : X x Z — >• X 
satisfying 

E[d d (X,(f>(Y,Z))] <D d (27) 
E[d e ((t>(Y,Z),ip(X,Z))] < D e . (28) 

Note that, thanks to Assumption [TJ the feasible set in (|25T > 
is not empty: we can choose Z as X and <f>, ip as the functions 
whose existence is guaranteed by the assumption. This choice 
demonstrates that 



R(D d ,D e )<H{X\Y). 



(29) 



Using the convex cover method 1 8 1 it can be shown that: 
Remark 4: Allowing for sets Z of cardinality greater than 

\X\ +3 does not decrease the value of the optimization 

problem. 

A consequence of this remark is that the minimum in (l25t is 
achieved: indeed, we may choose Z as the set {1, . . . , | X\ +3} 
with result that there are only a finite number of functions <j>, 
ip, and the problem is reduced to minimizing a continuous 
function over a compact set. 

The key properties of R(D d , D e ) are summarized in the 
following proposition: 

Proposition 5 (Key Properties of the Function R(D d , D e )): 
The function R(D d ,D e ): M.+ — > M + is bounded from above 
by H(X\Y) and is nondecreasing in the distortions 



D' d > D d and D' e > D t 



where the minimization is over all X taking value in X and 
satisfying 

X^>— X^>— Y (22) Moreover, it is convex and continuous 



R(D' d ,D' e ) < R(D d ,D e 
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Proof: See Appendix IE1 ■ 
Our main result can be now stated as: 

Theorem 6: The rate-distortions function for the setup in 
Section IlLAl is equal to R(D d ,D e ) 



R(D d , D e ) = R(D d , D e 



(30) 



Proof of Theorem [6} The coding scheme that establishes 
achievability is a variation on the coding scheme of Wyner 
and Ziv 0] and is thus only sketched. Its analysis is omitted. 

Fix Z, fa ip satisfying (l26l l and d28l l. and fix also a block- 
length n and some (small) e > 0. Let C be a random 
blocklength-n codebook with l2 n ( I (x-,z)-i(Y;Z)+2e) ^ binS) 

each containing approximately 2 n ^ I ^ Y ' Z ^^ codewords with 
the total number of codewords thus being [2 n ^ I ^ x ' z ' +e '\ . 
Generate the codewords independently with the components 
of each codeword being drawn IID Pz- Number the bins 1 

through l2n(HX;Z)-I(Y;Z)+2e)^ 

Upon observing the source sequence X n , the encoder seeks 
a codeword Z* n in C that is jointly typical with X n . If 
successful, it sends the number of the bin containing Z* n 
as the message M. It also produces the reconstruction se- 
quence X™ by applying the function ip componentwise to Z* n 
and X n . The decoder seeks a codeword Z" in Bin M that is 
jointly typical with its side-information Y n and applies the 
reconstruction function <p componentwise to Z n and Y n to 
produce X^. 

The converse is proved in Subsection III-DI ■ 
Though not identical, Steinberg's setup is very similar to our 
setup when d e (-, •) is the Hamming distortion and D e is zero 
(Remark[3]l. It is therefore not surprising that, as the following 
corollary shows, the two setups lead to identical rates: 

Corollary 7: Let d d (-, •) be arbitrary, and let d e (-, •) be the 
Hamming distortion function 



d c (x d ,x c ) — l{x d 7^ x c }, £ d ,x c eX. 



Then 



R(D d , D e ) 



D c =0 



Rcr{D d ). 



(3D 



(32) 



Proof: See Appendix lAl ■ 
Remark 8: Our results can be extended to a scenario where 
the encoder observes not only the source sequence {Xi} 
but also some sequence {Wi} which is correlated with the 
decoder's side-information sequence {Yi}. This additional 
sequence {Wi} makes it easier for the encoder to estimate the 
decoder's reconstruction sequence and thus allows the decoder 
to rely more heavily on its side information {Yi}. To see how 
this seemingly more general scenario reduces to our scenario 
assume that {(Xi, Wi, Yi)}™ =1 are IID random triples of law 
Pxwy and that Wi takes value in the finite set W. Consider 
now a new IID source {Xi} taking value in the set X = XxW 
according to the law Pxw with Xj = (Xi, Wi). The encoder 
now observes the source sequence {Xi} only and no additional 
sequences. The decoder side information is still {Yi}, and the 
joint law of Xi,Y{ is Pxwy- Finally define the new decoder 
distortion function d d : X x X — > R+ as 

d d ((Xi,Wi),Xi) = d d (Xi,Xi), 



i.e., the distortion function d d does not depend on the Wi- 
component. Solving the original scenario for this new source 
and new decoder distortion function is equivalent to solving 
the seemingly more general problem we described. 

D. Proof of the Converse to Theorem |6] 

To establish the converse, we show that if a triple 
(R, D d , D e ) is achievable, then for every e > 

R + e> R(D d + e,D e + e). (33) 

Since R(D d ,D e ) is continuous (Proposition |5), and since e 
can be arbitrarily small, this implies that R > R(D d , D e ) 
whenever (R,D d ,D e ) is achievable, and consequently that 
R(D d ,D e ) > R(D d ,D e ). 

The first part of our proof identifying the auxiliary random 
variable Zi ( f44l > and the function fa (|46*T i is similar to the proof 
of the Wyner-Ziv result [8j. For a given blocklength-n code 
/("), 0( n ), V (n) satisfying (HOME), we have 



n(R + e) 

(a) 

> H(M) 

(b) 

> I(X n ;M\Y r 



(c) 



^/(x^Miy",^- 1 ) 



(34) 
(35) 
(36) 



= ^H(X i |y n ,X i - 1 )-fl-(X i |M,Y Tl ,X < - 1 ) (37) 



(d) 



^ J2 H ( X i\ Y i) - H(X l \M,Y n ,X 1 - 1 ) 



> Y, H ( X i\Yi) ~ H(Xi\M,Y n ) 



(f) 



J2H(Xi\Yi) - H(Xi\Zi,Y^ 



= ^/;.V, : Z, ), 



i=l 
n 

Y J I(X l ;Z l )-I(Y l ;Z i ), 



(38) 
(39) 
(40) 
(41) 
(42) 
(43) 



where (a) follows by (TToT >; (b) follows because conditioning 
cannot increase entropy and because H(M\Y n ,X n ) > 0; 
(c) follows from the chain rule for mutual information; (d) 
follows because the pair Xi,Yi is independent of the tuple 
(X^ 1 , Y"] 4-1 , Y-\ 1 ); (e) follows from the fact that conditioning 
cannot increase entropy; (f) follows by defining 

Z i ^(M,Y i -\Y^ 1 ); (44) 

and (g) follows because with the definition above 

Zt-^-Xi-^-Yt. (45) 

Denote by (j)^ the function that maps (M,Y n ) to the i-th 



component of the n-tuple fa^ (M,Y n ), and denote by ip 



(n) 
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the function that maps X n to the i-th component of the n- 
tuple ^ n \X n ). Since there is a one-to-one correspondence 
between the pairs (Yi,Z{) and (M,Y n ), we can define a 
function that maps (Y t ,Z t ) to (M, F") 



(c) 



We now define 



d d (X u ^(M,Y n )) 



(46) 



(47) 



where E[-] is with respect to Px n Y n - By definitions d46b and 
(KB, 



(48) 



where E[-] is with respect to PxiYiPzAXi- 

We next turn to the encoder-side distortion. We will show 
that there exists a deterministic function ^ : X x Z — » X that 
achieves a distortion no larger than Z? e .i, where Z? e .i is the 
distortion achieved by ipf 1 ' (X n ), namely, 



De- A 4 E 



d & (^\M,Y n ),4 n) (X n )) 



To this end, we express D e j as 



Ex™,ZiEY,\x™,z., 



Ex™,Z i Ey.|x,,X v ,Z, 



d & (4>i{Yi,Zi), ^(Xi, Ay)) 



(49) 



(50) 
(51) 
,(52) 



where A\j = (X 1 For every [xi,z{) 6 A" x Z, we 

define :z£(:z;j, Zj) (or for short a^*) asl 



asJ(a;i,Zi) = argmin 



d e ((pi(Yi,Zi),ipl n \xi,x^)) 



(53) 



or in any other way that guarantees 



-X, i \X i =x i ,Z i =z i 



d e ((f>i(Yi,Zi), ipW (xi , X\i ) ) ' 

Ey 4 |x i =x 4 ,X v =a^,^ j =z < d e (fa (Yi, Zi) , tpl n '(xi , x0) 

We can now define the function ipi as 

tpi : X x Z ->• X 

(xi,Zi) i-> ipj: n \xi,x§(xi,Zi)). 

For every (xi,x\i, Zi) 6 X n x Z, we have 



> 



(54) 

(55a) 
(55b) 



-Yi\Xi=x i ,X v =x Xi ,Z i =z, 
(a) 

> E 



Yi\Xi=x i ,X\i=x* i ,Zi=Zi 



d e (<f>i(Yi, Zi),ipy l '(x i ,a\ i )) 

de(</>i(li,2fi), (ac»,a^)) 



(b) 



-Y i \X i =x i ,Z i =z i 



d e {<f>i(Yi,Zi), tpf' (xi,x^)) 



(56) 
(57) 



-Yi\Xi=Xi,Zi=z, 



d e (<pi(Yi, Zi),ipi(xi, zi)) 



(58) 



where (a) follows from the definition of a^; (b) follows 
because 



Ax 



V 



-Yv, 



and (c) follows from the definition of ipi d55l l. 
It now follows from d52l and d58l that 



d e (<pi(Yi, Zi),ipi(Xi, Zi)) 
Continuing from (|43T > we thus obtain 

n 

n(ii + e) > ^I{Xi-Zi)-I{Yi;Z, 



< D R 



(a) n 

i=l 

(b) 1 



i=l 

(«0 - / 1 -A l " 

> nR[-Y D d .,,-Y D c . 



i=l 



(d) - , 

> nR(D d + e,D e + e) 



(59) 

(60) 

(61) 
(62) 
(63) 
(64) 
(65) 



where (a) follows from the definition of R(D d ,D e ) and 
from d45l . (08J, and (l60l >: (b) follows by multiplying by 
1; (c) follows from the convexity of R(D d ,D e ) (Proposi- 
tion [5]); and (d) follows from the monotonicity of R(D d ,D e ) 
(Proposition |5J and the fact that — Y^7=i Am -5 Ai + e 
and — A,i _• D e + e. This establishes ( f33l > and thus 

concludes the proof of the converse. 

III. Gaussian Source and Quadratic Distortions 
A. Setup 

We next consider the case where the source, side informa- 
tion, and reconstruction alphabets X, y, X are the reals M; the 
distortion functions d d and d e are quadratic 

d d (x,x d ) = (x - x d ) 2 , (66) 



d e (x d ,x e ) = (x d - x e ) 2 ; 



(67) 



and the source and side-information pair (X, Y) is a centered 
bivariate Gaussian, where X is of variance a\ 



crx > 



(68) 



and Y = £X + U for some centered Gaussian U that is 
independent of X and that is of variance afj and where £ 
is a nonzero constant!! The rate-distortions function depends 
on £ only through the ratio crfj/^ 2 , because the receiver can 
pre multiply its side information by without affecting the 
rate-distortions function. In the following we thus assume that 
f = 1, i.e., 

Y = X + U. (69) 

We denote the rate-distortions function for this setup by 

R G (D d , D e ). 



2 If arg min is not unique, lEu (x; , Zi) is defined as the first in lexicograph- 
ical order. 



3 The problem is not interesting when £ is zero, because in this case the 
side information is independent of the source and is thus irrelevant. 
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When ajj is zero the problem is not interesting, because 
in this case the source sequence is determined by the side 
information, and R G (D d , D e ) is thus zero for all nonnegative 
values of D d and D e . We shall henceforth thus assume 



a v > 0. 



(70) 



In this case, no finite rate can allow D d to be zero (even if 
we ignore the encoder-side reconstruction constraint). Thus, 
we shall also assume 

D d > 0. (71) 

B. Related Work 

As we have seen in Section 1I-B, the Wyner-Ziv setup is 
obtained from ours if the encoder-side reconstruction con- 
straint ([T2l i is omitted, and Steinberg's common reconstruction 
setup is obtained if (fT~2t is replaced by (|20| >. 

For a Gaussian source and quadratic distortion measures, 
Steinberg's common reconstruction rate-distortion function is 

m 



R G (D d ) = ^ log 



1, + <4(<#+£> d ) 
2 g {*l + vl)D A 



The Wyner-Ziv rate-distortion function is jT] 



1 



#Wz(Ai) = ^l0g , 2 j_ , 



(72) 



(73) 



2" a (o* x +o*)D d 

This is the rate-distortion function even if the side information 
is revealed not only to the decoder but also to the encoder. 

C. Result 

Theorem 9: For a Gaussian source and quadratic distortion 
measures, the rate-distortions function R G (D d , D e ) can be 
expressed as follows: 

If v^^^^l^}, 



< J x+ a 

R G (D d ,D e ) = ilog 



then 



'x u u 



If y/D^fj < min { £ d , then 



R G (D d ,D e 



log" 1 



q| afj+D d -2y/alD 



•x 



D d -D c 



Proof: The direct part is proved in Section IIII-DI and the 
converse in Section UlI-EI ■ 
Remark 10: If D e = 0, then our rate-distortions function 
R G (D d ,0) coincides with Steinberg's common-reconstruction 
rate-distortion function R G (D d ) of (1721 : 



R G (D d ,D e 



D c =0 



= R G (D d 



(74) 



Remark 11: If Da and De are such that 



2 2 

T x°u 



De&u > min ^ D d , 



'x 



cry 



or 




<4 < - #e 



(76) 



then R G (D d , D e ) coincides with Wyner and Ziv's rate- 
distortion function i?° z (D d ) in (FT3J- Thus, if dV5j or (|76| i 
holds, then relaxing Constraint (TTZt and/or revealing the side 
information also to the encoder does not decrease the rate- 
distortions function. 

D. The Direct Part of Theorem [9] 

In the two cases that we shall describe in (l77l i and (l80t 
ahead, no encoding is necessary because the encoder and the 
decoder can produce sufficiently good reconstructions X™ and 
X d based solely on their observed sequences X n and Y n . In 
these cases R G (D d , D e ) is thus zero. 

1) If 

/ ( -r2 _2 

7 X a U 



and 



D e a^j > min 



A, 



r 2 ' 



(77a) 



(77b) 



then the encoder and decoder can produce the sequences 



X™ = 



x 



-X n 



'x 



'x 



-Y T 



°x^"u 

which satisfy the distortion constraints. 
2) If 



D c a\j < min \ D d , 



and 



D d >a 



x 



1 




'x u u 



(78) 
(79) 

(80a) 
(80b) 



then the encoder and decoder can produce the sequences 
X? = J-?-X n (81) 




(82) 



which satisfy the distortion constraints. 

The achievability of Theorem [9] in the remaining cases will 
be established using the following proposition with a judicious 
choice of the parameters. 

Proposition 12: For the setup in Section IIII-AI of a Gaus- 
sian source and quadratic distortion measures, the tuple 
(R, D dl D e ) is achievable whenever 



R > - log - x u 
~ 2 6 



'x u w 



'U W W 



(a 2 



x 



'U' U W 



(75) f° r some parameters a^, a > and b > satisfying 



(1 



2 ~2 _1_ „2 2 
X + a a W 



b 2 a% < D d 



and 



b 2 a? T < D e 



(83) 

(84a) 
(84b) 
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Thus, 

R G (D d ,D e )< min £ log 



a, b, of,, 2 



(<4 + 



(85) 

where the minimization is over all c 2 ^ , a > and 6 > 
satisfying ([84-b . 

Proof: See Appendix ICl ■ 

We can now prove the achievability part of Theorem |9] for 
the remaining cases. 

3) If 



D c af r > min <^ D d , 



'x u u 



•x 



and 



then the choice 



D A < 



1 1 

T _Mu 



'X 



Or 



>w 



(86a) 



(86b) 



(87a) 



(which is positive by d86b| >) and 



'w 



'X 



'X"U 



-D d , 



b = 



satisfies (l84l because 



x 



-(l-o) 



u x 

Da 



(87b) 



(87c) 



(1 - a - &) 2 ct| + aV^ + b 2 a 2 v 



'x 



-b-b 



'X 



X 



Da 1 



D 2 

t 2 4- — d 



- f ^ (88) 



u n 

; — U A 



Da 



and 



'x u u 



a u 



2 
d 
2 
U U 



Moreover, for this choice, 



2 & 



{a 2 x + a 2 )a 2 



u) u w 

2 (T 2 

x a u 



- log a X a L 

2 g (a 2 x +a 2 )D d 



(89) 
(90) 

(91) 



(92) 



Thus, by (|90]l-(l92l) and by Proposition [12] we conclude 
that when D d and D e satisfy 



R G {D d , D e ) < -log 



'x u u 



(a 2 x + afj)D A 



4) If 



D e afj < min lD d , 



2 2 



'X 



(93) 



(94a) 



and 



D A < a x 1 




then we consider the choice 



b = 




vx + vw 



(1-6), 



o 2 x (D d -b 2 o 2 



a 2 x (l~b^+b^ 2 -D d 



a 2 x (D d -D e ) 



x 



1 



(94b) 

(95a) 
(95b) 

(95c) 



D„-Da 



To see that the RHS of d95c| ) is positive note that 
implies that the denominator is positive, and 
implies that the numerator is positive because 



I94b[> 

El 



D c cr 2 < min i D A , ■ 



'x u u 



'x 



D e <mm{a 2 ,D d } . (96) 



(Since o\j{a 2 x + afj) is smaller than one, the LHS of 
( |96l > implies that D e < a^. This, and the fact that the 
LHS of d96l l also implies that D<.o 2 j < D\ demonstrates 
that the LHS of d%]l also implies that D e < D d .) 
This choice satisfies (l84l> because 



(1 - a - b) 2 a 2 x + a 2 a 



w 



>u 



' w 

4 



2 



'X 



-&) 



'X 



' w 



2 



a 2 x (l-b) 2 



D r 



a 2 x (l-b) 2 (D d -b 2 a 2 ) 



a 2 x (l~b) 2 



D c 



Da 



and 



b 2 alr = D c 



D e 
(97) 

(98) 

(99) 
(100) 

(101) 



Moreover, for this choice, 



1, a 2 x af r + o 

1. o 2 
= 2 l0g - 



x u w 



'U U W 



OrrjO-j, 



(a 2 , + D d - 2^/ajD c ) 



(102) 



{a\ + a 2 )(D d -D c ) 

Thus, by ( 1 1 00b — ( 1 1 02b and by Proposition [12] we con- 
clude that when ([94]) holds, 



R u (D d ,D e ) < - log- 



l n oj c (o% + D d -2 y /o%D. 



(o\ + a 2 )(D d - D c 



(103) 



x 



Remark 13: The expressions in Proposition Q~2] and their 
relation to (|25"b become more transparent when we define 



Z=a(X + W) 
X d =bY + Z 
Xr=bX + Z 



(104a) 
(104b) 
(104c) 



for a > 0, b > 0, and W a centered Gaussian of positive 
variance a^y independent of the pair (X,Y). With these 
definitions 



1 ^-2 J2 

1 , cr Y a, 



I(X;Z\Y) = -\og^> ' 



a x a w + <7 u a w 



E 
E 



(X - X d f = (1 - a - b) z a 
(X d - X e f 



x 

2_2 
X 



(105a) 



>w 



b 2 al (105b) 
(105c) 



Since Z^>— X^>— Y for all choices of the parameters a > 0, 
b > 0, (J\y > 0, we can also rewrite ( |85l > as: 



R G {D d ,D e ) < min I{X;Z\Y) 

z,x d ,x e 



(106) 



where the minimum is over all Z, X d , X e that are of the form 
in (1104-b and satisfy the distortion constraints 

(X-X d f~\ <D d , (107) 

(X d -X e ) 2 ] < D e . (108) 



E. The Converse for Theorem® 

If 2 2 

v /^>min{£» d , l x l u , } 
x u 

then the converse follows by relaxing the constraint (TTZt ; see 
Remark [TT] We thus focus on the case where 



D e a? 7 < min <j D d , 



u x u u 



We define the function R. 



■cnt • Jri ++ 



(109) 



like R(-,-) 



except that its first argument (-Dd) is strictly positive; the 
minimum is replaced by an infimum; and the size of the 
auxiliary alphabet Z can be unbounded. Thus, 

flontpd,-De)= inf I(X;Z\Y) (110) 

where the infimum is over all choices^ of the random vari- 
able Z and functions </>, ip satisfying 



E 
E 



(X - X d f 
(X d - X e ) 2 

z^> 



<D d , 

<D e , 
X^>-Y, 



where 



(111a) 

(Hlb) 
(Ulc) 

(11 Id) 



X e ±Tl>{X, Z). 



(llle) 



In analogy to Proposition [5] we have: 

Lemma 14: Over R++ x R + the function R cn t(D d , D e ) is 
finite; monotonic in each of its arguments; and convex. 

Proof: The function is bounded by the rate-distortion 
function of the Gaussian source without side information. The 
proof of monotonicity is identical to the proof of monotonicity 
in Proposition The proof of convexity is also very similar; 
only a minor change is needed to account for the fact that, 
prima facie, the infimum need not be achieved. ■ 

The following lemma provides an explicit expression for 
Rcnt{D d ,D e ) when ( fT09l holds. 

Lemma 15: If D d > and D e > satisfy i 109b , then 



Rcnt{D d ,D e ) = ilog 4 



'x 



D d -D c 



Proof of Lemma [73} We first prove 



(112) 



Rcnt{D d ,D e ) < - log* 



0% crfr + Dd- 2y/afjD c 



•x 



Da - D r 



To this end, we present a choice for Z, X d , X e that satisfies 
the constraints (II 1 U and is such that the objective function 
I(X; Z\Y) in dTTOl evaluates to the RHS of (TTT3T) . Our choice 
depends on whether 



D d >a 



or 




(114) 



(115) 



In the first case ( II 141 i the RHS of ( II 131 ) evaluates to 0, whereas 
in the second case ( II 15b it is positive. 

When D d and D e satisfy (1 1 14b . a suitable choice is — as in 
(f8TT > and (l82l in the proof of the direct part — 



XL 





(116) 



When D d and D e satisfy ( II 15b , a suitable choice is — as 
in d95l ) and ( 1104b in the direct part — 

Z = a[X + W), X e = bX + Z, X d = bY + Z, (117) 

where If is a centered Gaussian of variance erj^ = 
and independent of the pair (X, Y) 



CTi(i-V^A4) 2 +A=-Ai 
and where b 



a? r and a 



(1 - 6). That 



this choice has the desired properties follows by (flOOVdlOl 
and JTOBI . 

Having established ( II 13b , we now complete the proof of the 
lemma by proving the reverse inequality 



4 To be more precise we should specify the set where Z may take value, and 
we must restrict the functions and ip to be measurable. In the converse Z 
will correspond to the tuple (M, Y l ~ 1 , Y^jJ, and we can therefore restrict 
Z here to be the space where such tuples take value. 



Rcnt(D d ,D e ) > ilog+ 



a\ afj + D d - 2y/afD B 



'x 



Da - D n 



(118) 
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Since rates are nonnegative, it suffices to prove 



R G (D d ,D e )> ilog 



o\ <J 2 u +D d -2^/af J D c 



' x 



Da - D c 



(119) 

where log + has been replaced by log. 

Since the joint law of (X, Y) is fixed and is a bivariate 
Gaussian law 



I(X; Z\Y) = h(X\Y) - h(X\Y, Z) 

1, / alafr 
= - log 2ire 9 X u 9 

Consequently, ( 11 19b is equivalent to 



-h(X\Y,Z). (120) 



fl < ^ log | 2-Keal, 



Da - D r 



a 2 +D d - 2^a 2 D c 



where Q is defined as 



sup h(X\Y,Z) 

z,4>,ip 



(121) 



(122) 



under the same constraints (11 1 It that define R cnt (D d ,D e ) 
in (fTTOl) . 

To prove (11211 > we first note that, since Xd is a deterministic 
function of (Y, Z), 

h{X\Y,Z) = h(X - X d \Y,Z,X d ) (123) 
= h(X -X d \X-X d + U, Z, X d ) (124) 
<h{X - X d \X - X d + U) (125) 



where in the second line we recalled that Y = X + U 
and where the last line follows because conditioning cannot 
increase differential entropy. 

The Markov condition Z^>-X^>— Y ( llllcl ) and the fact 
that Y = X + U (|69]l imply that 



Z^>-X- 



-u. 



This, combined with the assumption that U is independent 
of X, implies that U is independent of (X, Z). And since X e 
is a function of (X, Z), 



U and (X e ,X,Z) are independent. 



(127) 



This independence implies that U is independent of (X — X e ). 
This latter independence and the fact that X — X^ can be 
expressed as — (Xj — X e — (X — X e )) implies that 



Cov(X - X d , U) = - Cov(l d - X e , U). 



(128) 



From (1128b . (11 1 lbk the fact that the variance of a random 
variable cannot exceed its second moment, and the fact that 
the magnitude of a correlation coefficient cannot exceed 1, it 
follows that 



\Cov(X - X d ,U)\' 



D e al 



< 

From ( 1125t and ( 1129t we thus obtain 

n < r 

where T is defined as 

T = sup h(X - X d \X -X d + U) 



(129) 



(130) 



(131) 



subject to the relaxed constraints 



Var(X-X d ) < D d , 

2 



| Cov(X - X d , U) 
We now proceed to study V. Define 

A = X - X A 



< D e o%. 



so 



sup h{A\A + U) 

A 



subject to 



Var(A) 
Cov(A,U)\ 2 



< 
< 



D d , 
D e al 



(132a) 
(132b) 

(133) 
(134) 



(135a) 
(135b) 



By the conditional max-entropy theorem [9 |, the supremum 
in ( 11341 i is achieved when (A, U) are jointly Gaussian, as we 
henceforth assume. As we next argue, the lemma's hypothesis 
that ( |109b holds implies that the choice of A as — U is not 
in the feasible set. Indeed, with this choice | Cov(A, U)\ 2 is 
equal to afj, which violates (II 35bb because ( 1109t and 
imply 



D e < miri-tcr^, D d }. 



(136) 



We thus assume in the following that A is jointly Gaussian 
with U and that A ^ — U. Consequently, 



h(A\A + U) 
— - log 



- log 

2 S 



2ne ( afj 



(° 2 



u 



KAU) 



CT, 



'A "T" G V 
' A U U ~ K AU 



2ire- 



2kau 



a 



u 



2ka. 



(137) 
(138) 



where a\ = Vai(A) and kau — Cov(A, U). 



(126) We can thus rewrite the optimization problem in ( 1131) as 



r = sup i log 



2ne- 



'A u u 



"AU 



\i ) 



subject to 



< 
< 
< 



\KAU\ 

\kau\ 



<D d , 
< D e a 



< 



J 1 J 1 
cr A a u . 



(139) 



(140) 
(141) 
(142) 



(We have to add the last constraint because the magnitude of 
a correlation coefficient cannot exceed one.) For fixed kau, 
the objective function in ( 11391 ) is monotonically increasing in 
a\ (see also (1137b ). and so is the RHS of Constraint ( 11421 ). 
Therefore, it is optimal to choose in (1139b 



D d . 



Substituting this choice in ( 11391 ) and (1142b yields 



sup - log 

KAU * 



2ne- 



D^l 



"AU 



D d 



2kau J 



subject to ( 11411 ) and 



0< \n AU \ 2 < D d a 2 v . 



(143) 



(144) 



(145) 
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Notice that, whenever d 109t holds, the RHS of ( 1141b is upper- 
bounded by the square of min{Z?d, cjy}- Consequently, 

(([109} and <ED) (Jk ac/ | < nun{D d ,<4}). (146) 

Since the RHS of ( fl46l implies < fl45l 

f( fT09l and => (US), (147) 



and Constraint ( 1145b is redundant. We therefore ignore Con- 
straint ( 1145b and study the maximization in ( 1144b subject to 
(1141b only. 

To this end, we compute the derivative of the objective 
function in ( 1144b with respect to kau'- 

1, /„ Ai^ " i^ajj 



&K A l 



log 27re 



-(D d + K AU )(afj + kae/) 
(L> d + o& + 2ka£/)(Ai<4 _ ^ AU ) 



(148) 



By ( 1146b , the derivative in d 148b is negative for all feasible 
kau- Hence, the objective function in ( 1144b is decreasing on 
the (symmetric) interval of interest (1141b . and it is optimal to 
choose 

(149) 



KAU 



D e a 2 v . 



The optimality of this choice allows us to evaluate T via (1144b 
and hence to upper-bound £1 via ( 1 1 30b - This yields the desired 
bound ( 1121b . which establishes the lemma. ■ 
Proof of Converse when ( 1109b holds: Using Lemma [14] 
and Lemma [15] we can follow the steps of the proof in 
Section lH-Dl of the converse part of Theorem[6] The remaining 
technicality is continuity. Continuity in the interior, i.e., on 
R++ x K++ follows from convexity. It thus only remains to 
establish continuity when D& > 0, dl09b holds, and D e is zero. 
This can be done by inspecting ( II 12b . ■ 

IV. More and More-General Constraints 

So far we have only studied settings with two distortion 
functions, one of which — the decoder-side distortion function 
dd(x, id) — depends on the source symbol and the decoder's re- 
construction, and the other — the encoder-side distortion func- 
tion d e (£i,x e ) — depends on the decoder's and the encoder's 
reconstruction symbols. In this section we extend our setting 
to allow for more than two distortion functions and to allow 
for distortions that depend on all three symbols: the source 
symbol x, the decoder's reconstruction symbol id, and the 
encoder's reconstruction symbol i e . We shall also allow 
the reconstruction alphabets to differ. But all alphabets are 
assumed finite. 

A. Problem Statement 

The new setup differs from the setup in Section [0] in two 
ways. 

• The encoder-side reconstruction X" and the decoder-side 
reconstruction X£ ta k e value in the finite alphabets X" 
and X^ which can be different. 



There are K (possibly larger than 2) distortion constraints 
specified by the K distortion functions 



dk ■ X x X& x X e 



fce {!,..., K} (150) 



and the corresponding K maximal-allowed distortions 
Dk (all of which are assumed to be nonneg- 

ative). 

We say that the tuple (R,Dx, ■ . ■ ,Dk) is achievable if for 
every e > and sufficiently large n there exist a message set 
M of size \M\ < 2 n{ - R+ ^ and functions 



Mxy n - 
x n -> X" 



(n) 



XI 



(151a) 
(151b) 
(151c) 



such that the message M = f( n >(X n ) and the reconstruction 
sequences X d " = ^ (M,Y n ) and l e n = ^ n \X n ) satisfy: 



1 n 

— > E \dk(Xi, Xdi, X ei ) 



< D h 



ke{i, 



,K}. 

(152) 



In analogy to Assumption [T] we shall assume: 
Assumption 2: To each x e X corresponds some id S Xd 
and some i e £ X e satisfying 



d k (x,x d ,x e ) = 0, k € {1, . . . , K}. 



(153) 



We seek the smallest rate R for which the tuple 
(R, Dx, . . . , Dk) is achievable. This is defined as follows. 
Given a maximal-allowed-distortion tuple (Di, . . . , Dk), let 



Hvzx{D 1 ,...,D K ) 

= {Re R 



(R, Dx, . . . ,D K ) is achievable}. (154) 



Assumption [2] implies that the set 7?.Ext(-Di, ■ • ■ , Dk) contains 
all rates exceeding H(X\Y) and is thus nonempty. The rate- 
distortions function i?£ Xt can now be defined as 

R En (Dx,...,D K )= min R, (155) 

where the minimum exists because the region 
'R-ExtiDx, ■ ■ ■ ,Dk) C M + is nonempty, closed, and bounded 
from below by 0. 

B. Result 

To describe the rate-distortions function for the extended 
setup of Section IIV-A1 we next introduce the function 
R En (Dx,...,D K ). 

Given the joint law Pxy of the source and side information, 
and given the distortion functions dx, . . . , <iff , this function is 
defined as 



R Ext (D 1 ,...,D K ) 



min (I(X;Z)-I(Y;Z)) (156) 



where the minimization is over all discrete auxiliary random 
variables Z and U satisfying 



(XJ,Z)^-X- 



-Y 



(157) 
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and over all functions <p : y x Z X$ and ip: XxZxU —> X e and define [/* as 
that simultaneously satisfy the K distortion constraints 

E[d k (X,<p(Y,Z),iP(X,Z,U))] <D k , ke{l,...,K}. 

(158) 



(165) 



The following proposition provides cardinality bounds on 
the support sets of the auxiliary random variables. 

Proposition 16 (Cardinality Bounds): The minimum defin- 
ing R,Ext(Di, . . . , Dk ) is not increased if we restrict the 
cardinality of the support set Z of Z to 

\Z\ < \X\\U\+K + 1 (159) 

and the cardinality of the support set U of U to 

\U\ < K. (160) 

Proof: The cardinality bound on Z can be justified using 
the convex cover method QSJ. The cardinality bound on U is 
proved in Appendix [D] ■ 

Remark 17 (Improved Cardinality Bound): The cardinality 
bound on U can be strengthened: \U\ need not exceed the 
number of distortion constraints in (11521 that depend on X e j. 
The latter number equals 1 in the original setup of Section [II] 
thus allowing us to recover Theorem |6] 

Proposition 18 (Key Properties of the Function RexO-' 
The function i?Ext^ R+ — > R+ is bounded from above by 
H(X\Y); it is nondecreasing in the distortions 



D' K > D K 



Rbm{D'x, • • • , D' K ) < R Ext (Dx, ...,D 



«■)); 



and it is convex and continuous. 

Proof: The proof is similar to the proof of Proposition |5] 
in Appendix [B] and is omitted. ■ 
Theorem 19: The rate-distortions function for the setup in 
Section HV-AI is equal to i?Ext(-C>i, . . . ,Dk)' 

Rem{D\, . . . , Dk) = RExt(Di, . . . , Dk)- 

Proof: The achievability, i.e., that 

Reu(D 1 ,...,D k ) < Rba(D 1 ,...,D k ), 



(161) 



(162) 



can be proved using a scheme that is similar to the one 
that was sketched in the proof of Theorem [6] The only 
difference is that, to produce the reconstruction sequence X™, 
the encoder applies the function ip component-wise to the 
tuple (X n ,Z* , \U n ), where, conditional on (X n ,Z* n ), the 
components of the sequence U n are generated independently 
according to the conditional law Pmz,x- The analysis of this 
scheme is omitted. 

We next prove the converse, i.e., that 



R E n(Di,...,D K ) > Rb*(Di,...,D k ] 



(163) 



Fix some positive e, a blocklength n, and a rate R. Let M. 
be a message set of size \M\ < 2 n( - R+e \ and let /W, 
and ijj^ be encoding and reconstruction functions as in ( 1151) 
that satisfy the K distortion constraints in (1152b . For every 
i E {1, . . . , n}, define Zi in d44jl 

Z l ^(M^-\YlX 1 ) (164) 



Notice that for every i G {1, . . . , n} 

(UuZJ^-Xi-^-Yi. (166) 
Also, following the steps in (I34ll-(l43l. we can conclude that 



i{R + e)>Y J I{.X i ;Z i )-I{Y i ;Z i 



(167) 



We further define — as in Section III-D1 — cf>^ to be the 
function that maps (M, Y n ) to the i-th symbol of <^ n ) (M, Y n ) 
and ipf 1 ' to be the function that maps X n to the i-th symbol 
of ^ n \X n ). Then, the symbol <^ n) (M,Y n ) can be written 
as 

(168) 



<t>i(Yi,Zi) = cf) < i l \M,Y n ) 1 



and ^^{X 11 ) can be written as 



MX,z u u t )^4 n \x n ), 



(169) 



for some functions <fri and ipi with arguments in the respective 
domains. We finally define for each k E {1, . . . , K} and i E 
{!,...,»} 



D k A 4 E 



d k (X u 4 n) (M,Y n ),4 n) (X n )) 



where E[-] is with respect to Px n Y n - Notice that 

n 

J2 D k,i< D k + e, ke{l,...,K} 



(170) 



(171) 



because the chosen encoding and reconstruction functions 
/("), (f>( n >, and ip( n > satisfy ( 1152b . Moreover, by definitions 
(Tl68T>— (fl70T>. 



E [d k {X t , (Yi, Z t )^ % (X t ,Z u Ui))]= D k , 



(172) 



where E[-] is with respect to PxiYiPuiZi\Xi- 

Combining ( 1167b and ( 1172b with the definition of i?Ext, we 
obtain 



i{R + e) > HX; Zt) - I(Y t : Zi) 

i=l 
n 

>^i?Ext(£>M,---,^) 



(173) 
(174) 



> ni? Ext 



n n s 

-rD 1|i ,..„-v% d75) 

i=i i=i 7 



>nR EA (D 1 +e,...,D K + e), 



(176) 



where the last two inequalities follow by the convexity and 
the monotonicity of i?E X t and by ( 11711 ). By the continuity of 
i?Ext and because e > and the blocklength n are arbitrary, 
the converse (1163b follows immediately from (1176b . ■ 
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Appendix A 
Proof of Corollary[7] 

When d e (-,-) is the Hamming distortion and D e = 0, our 
average-per-symbol distortion constraint (Q~2} is less stringent 
than the block-distortion constraint (l24l in Steinberg's setup 
(Remark |3). Consequently, 

Rcr{D d )>R{D d ,0). (177) 

It remains to prove the reverse inequality. Let Z, tf>, and tp be 
minimizers of R(D d ,0), so 

R(D i ,0)=I( i X;Z)-I(Y',Z) (178a) 
E[di(X,4>(Y,Z))] < D d (178b) 
(f>(Y,Z)=i/}(X,Z) w.p. 1 (178c) 

Z^-X^>-Y. (178d) 

To prove the reverse inequality we shall upper-bound R a (D d ) 
by showing that 

X = <f>(Y,Z) (179) 

is feasible in the minimization d2Tb that defines it. 

From the definition of X d 1 79b and from dl78ct , it follows 
that X is computable (almost surely) from (X,Z). This 
combines with d!78db to establish that 

(X, Z)^>-X^>-Y (180) 

and, a fortiori, that 

X^-X^-Y. (181a) 
And by <fT78b1 and ( fT79l 

E[dd(X,A-)] <£> d . (181b) 

It follows from (118 lb that X is feasible in the minimization 
(fJTJ defining R cr (D d ) and thus 

i?cr(Ai) < I(X; X) - 7(F; X) (182) 

= I(X;X\Y) (183) 

<J(X;Z|Y) (184) 

= I(X;Y)-I(X;Z) (185) 

= i2(Dd,0) (186) 

where (11831 1 follows from (1181a) ; where (1184) follows, by the 
(conditional) data processing inequality, from 

X^>-{Y,Z)^>-X (187) 

(which holds by ( fT79l ); where (185) follows from « fT78dt ; 
and ( 11861 1 follows from d!78at . Inequalities ( 11771 i and (1186t 
establish the corollary. 



Appendix B 
Proof of Proposition^ 

That R(D d ,D e ) is bounded by H(X\Y) is just a restate- 
ment of d29l i. Monotonicity holds because the feasible set 
in the minimization defining R(D d , D e ) is enlarged (or is 
unaltered) when D d and/or D e are increased. 

As to the convexity, let Z^\ ^W, and Z&\ , ^ (2) 
be the random variables and functions that achieve the minima 
in the definitions of R(D^,D^) and R(D^\d { ^). Let 
Q ~ Bernoulli(A) be independent of (X, Y, Z« , Z( 2 >). Define 

Z^{Q,Z^) (188) 

and the functions 

<t>(Y,Z) = ^ Q) (Y,Z^) (189) 

1>{X,Z) = ^ (Q) (X,Z {Q) ). (190) 

Then 

Z^-X^y-Y- (191) 

E[d d (X,<j>(Y,Z))] (192) 
= XE[d d (X,^(Y,Z^))} 

+ (1 - X)E[d d (X, <f>^(Y, Z^))] (193) 
< XD^ + {l-X)Df- (194) 

and 

(195) 

- XE[d e (^(Y,Z^),^(X,Z^))] 

+ (1 - X)E[d c (^ 2 \Y,Z^),^(X,Z^))} (196) 
< A/J( 1) + (1-A)^; (197) 

so Z, <j>, ip are feasible for the distortions 

(xdP + (i-x)d?KxdP + (i-x)dP). 

Consequently, 

R(\DP + (1 - A)£>< 2) , AD« + (1 - A)£>f >) 
</(X;Z)-/(Y;Z) 

= fr(x) - - fr(y) + h{y\z) 

= H(X) - H{X\Z {Q \Q) - H(Y) + H(Y\Z {Q \ Q) 
= H(X) - XH(X\Z^) - (1 - X)H{X\Z^) 

- H(Y) + XH(Y\Z^) + (1 - X)H{Y\Z { ^) 
= X(I(X;ZW)-I(Y;ZW)) 

+ (1-X)(I{X-ZW)-I(Y-Z^)). 
= XR(D ( d 1} , D?) + (1 - A) R(D ( d 2) , D {2) ). (198) 

To conclude the proof it remains to prove that R(D d , D e ) 
is continuous on R 2 ^. (Continuity on R+ + is a consequence of 
the convexity, but we also claim continuity in the closed set 
R++-) Since R+ is locally simplicial (as can be verified by 
the definition in iflOl Section 10, p. 84] or using [10 Theorem 
20.5, p 184]), the convexity of R(D d , D e ) on R^ implies its 
upper-semicontinuity relative to R 2 ^. It thus remains to prove 
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lower-semicontinuity relative to K^_. That is, we need to show 
that 

(D^,D^)^{D d ,D e ) 
implies that there is a subsequence {k„} such that 
R{D d ,D & ) < lim R{D ( ^\D^). 

V—XXj 

Let <f)^\ V> (k) , P { z\ x achieve R(D ( d K) , D ( e K) ) with Z = 
{1, . . . , \X\ + 3}. Since there are only a finite number of 
functions from y x Z to X and only a finite number of 
functions from X x Z to X, we can choose a subsequence 
along which: the mappings </>( K ") do not depend on v and can 

be thus denoted (/>; the mappings ijj^ K ^ do not depend on v 

(k„) 



Lemma 20: Let \I> be uniformly distributed over the cen- 
tered unit n-sphere, and let /j be a deterministic unit-length 
vector in M™. Then, 

Pr[(^ / x)>,] = C " (a ;7f )) , <r<l. (200) 

Lemma 21: For < t < 1: 

1, /C„(arccos(r)) \ 1 , ,. 
lim - log " V 1 yy = - log 1 - t ) . (201) 

n->oo n y G n (?rj / 2 

Lemma 22: Let /: R — > (0, 1] be such that the limit 



rjx = lim — log/(n) 

n— >ca n 



and can be thus denoted ip; and the conditional laws P^' x exists and 771 > 0. Then, 
converge to some conditional law that we denote P^\x m ^ 
continuity of mutual information, R(D d Kv \ Di^) converges 
to I(X; Z) - I(Y; Z) evaluated with respect to P^ x P X y, 

and -R(-D(i: D e ) cannot exceed this value because P^) x , ip, 
and <p are in the feasible set defining it. 



lim(l-/(n)) 2 = 

n— ► oo 

Lemma 23: For 9 £ (0,7r/2) 



1 if m > n 2 

if T]l < T] 2 . 



lim m = 0, 



Appendix C 
Proof of Proposition[T21 

We present and analyze a scheme that achieves the rate- 
distortions tuples in Proposition [T2] Before describing the 
scheme, we introduce some notation and lemmas on n- 
dimensional spheres. 



A. On n- dimensional Spheres 

An n-sphere of radius r > centered at £ £ 
of all vectors x £ M. n satisfying 



is the set 



C n (n) 



whereas for 9 £ (tt/2,it) 



v 9^1 1 

n— ><x 7T ) 



B. Scheme 

Our scheme has parameters 

a, S, a w > and b > 



(202) 



(203) 



(204) 



(205) 



(206) 



that must satisfy Conditions ( I84ab and d84bb . which we repeat 
for convenience here: 



When the center of the sphere £ is the origin 0, we call it a 
centered sphere, and when the radius of the sphere is 1, we 
call it a unit sphere. 

We denote the angle between two nonzero vectors u, v £ 
1" by <(u, v). Its cosine is 



cos<(u,v) = 



lullllv 



(199) 



Given a nonzero vector fi on an n-sphere S, the spherical cap 
of half-angle 9 centered at fi is the set of all vectors x on S 
satisfying 

<(/x,x) > 9. 

The surface area of such a spherical cap does not depend on 
the vector fi but only on the dimension n, the radius of the 
sphere r, and the angle 8. If the radius r = 1, we denote this 
surface area by C n {0). 

We say that a random n-vector is uniformly distributed over 
an n-sphere, if it is drawn according to a uniform probability 
measure over the surface of this sphere. 

The proofs of the following four lemmas are based on 
results in IfTTI and omitted. 



(1 - a - b) 2 a x + a A (j l w + b l afj < D d 



b 2 a? r < D e 



(207) 
(208) 



To describe and analyze the scheme we use vector notation. 
Let X denote the n-dimensional column-vector that results 
when the source symbols are stacked on top of each other 



X 4 (Xx X 2 



X„ 



(209) 



Likewise define the side-information vector Y and the recon- 
struction vectors Xj, and X e . 
1 ) Codebook generation: Let 



R' = ~ log 



' w 



R A 1 lo „ ( °X°U + °X°W + °W°U 

2 H (a x + 



(210) 
(211) 

(212) 



Draw \2 nR ] independent random n-vectors 
{Z(1),Z(2),...,Z([2"- R '])} uniformly over the centered 
n-sphere of radius r = ^Jna 2 z . Assign these vectors to 
|2«(fl+<5)j bins: the first \2^- R -^>] are assigned to bin 1, 
the following \2^ R - R ~ s )~\ vectors are assigned to bin 2, etc. 
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More specifically, if B(m) denotes the set of vectors assigned 
to bin m G {1, ... , [T l{R+s )\ }, then 

B(m) = {Z (m _ 1) |- 2 (i i '-R_ S )-| +1 ,... ) Z m |- 2 (R'-R_ S )-|} 

for to = 1, ... , [2 n( . R +V\ - 1 and 

#(L 2 +5) J) — { Z (L2"<«+ 5 >j-i)+i7 • • • j Z[- 2 »h'-| }■ 

The codebookC = {Z(l), Z(2), . . . , Z([2 nir l )}. 

2) Encoder: Given the source sequence X = x, the encoder 
looks for the codeword z* G C that is closest to having the 
"correct" angle with x: 



arg mm 

zGC 



cos 



<(x,z)- y/l-2- 2R ' 



(213) 



The encoder then sends M — m*, where to* denotes the index 
of the bin containing z*. It also produces the reconstruction 
sequence x c = z* + fox. 

3) Decoder: Given M = m* and the side-information 
vector Y = y, the decoder chooses 



z = arg mm 

zeB(m*) 



cos<(y,z) - Vl - 2- 2 ( R '- R ) 



(214) 



and produces the reconstruction sequence Xd = z + by. 

With probability 1 the argmins in ( 1213b and (12141 ) are 
unique. 

C. Analysis 

We fix e > sufficiently small such that 

(1 - 4e)Vl -2- 2 <- R '- R ) > ^i-2- 2 ( R '- R - s / 2 ), (215) 

and define the following four events: 

1) £ SIC : "The source and side information are atypical", 
i.e., 

1 



ixr-<4 



> ecr x or 



> eoy or 



| cos<(X, Y) - p X y \ > epxY 



(216a) 

(216b) 
(216c) 



where pxy denotes the correlation coefficient between 
X and Y: 



Pxy = \ 2 f 2 ■ (217) 

2) £ one : "No codeword has a good angle with the source 
sequence", i.e., 



cos 



<(X, Z*) - y/l - 2- 2RI > ey/l-2- 2R '. (218) 



3) £deci : "The chosen codeword Z* does not have the 
correct angle with the side-information sequence", i.e., 



cos<(Y,Z*) - - 2- 2 ( R '- R ) > 4ey/l-2~ 2( > R '- R) . 

(219) 

4) <?dec2 : "The decoder does not find the correct code- 
word", i.e., 

Z^Z*. (220) 



Also, we define the event 

£ ^sre U £ cnc U ^dccl U £(lec2- 

Lemma 24: 



lim Pr[£] = 0. 



(221) 



Proof: We note 



Pt[£] < Pr[4rc] + Pr[fenc|f s c rc ] + Prided K c n £ c cnc ] 

+Pr[£ dcc2 \£: rc n£ c cnc }. (222) 



In the following we show that each term on the RHS of ( 12221) 
tends to zero as the blocklength n tends to infinity. The first 
limit 

lim Pr[£ src ] = (223) 

n— >oo 

follows directly from the weak law of large numbers. The 
second limit 

lim Pr[£ cnc \£< IC ] = (224) 

n— >-oo 

can be shown following the same steps as in the proof of 
Limit (134) in 02). The third limit 



lim Pr[f docl |£: s c rc nf e c nc ] =0 

n— »oo 

is proved as follows. We have 
cos <(Y, Z*) = cos <(X, Y) cos <(X, Z*) -+ 



(225) 



(Y ± ,Z*- 
IIYIHIZ* 



(226) 



where Y 1 - and Z* 1 - denote the components of Y and Z that 
are orthogonal to X: 



||X|| 2 

Y-cos<(X,Y)||Y| 



11X11 



and 



»*_L A 



Z* 
Z* 



(X,Z*) 



X 



cos<(X,Z*)||Z* 



X 



Let txz* satisfy 



txz* G [(l-e)%^ Z ^ 7 , (1 + 6)72" 
and let x and y be vectors in M™ satisfying 



2R' 



1 



II 2 2 

x ll 



< 



1 



^ 2 2 



I cos < (x, y) - pxy \ < epxY ■ 
Then, conditional on events 



(227) 
(228) 

(229) 
(230) 

(231) 

(232a) 

(232b) 
(232c) 



f s c rc , £ c cncl X = x, Y = y, cos<(X,Z*)=t xz ,, 

(233) 

by (123 U and (1232cl l. we have 



cos<(X,Y)cos<(X,Z*) < (1 + 6)^(1 + e)\/2 _ 27f 7 
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(a) 



and 



cos 



< y/l-2-( R '- R )(l + 3e) (234a) 



<(X,Y)cos<(X,Z*) > (l-e)p X Y(l-e)V2- 2Iil 

(a) 



> V / l-2-(«'-- R )(l-3e), 

(234b) 

where Inequalities (a) follow because 

p xy • a/I - 2- 2fi ' = a/I - 2-^'-^) (235) 

and because e G (0, 1). Moreover, conditional on the events 
in ( 1233) , the vector Z* 1 - is uniformly distributed over a 
centered (n — 1) -dimensional sphere of radius cr z {l — t 2 xz ,), 
and thus Limit (1236b on top of the next page follows by 
Lemmas [20] and [23] 

We can combine Limit ( 1236) and Inequalities ( 1234) to obtain 
the limit (1237) on top of the next page. If in J237I ) we take 
the expectation with respect to X, Y, and cos<(X, Z*) (but 
keep the conditioning on events £ s c rc and ££ nc ), we obtain the 
desired third limit (1225) . 

We finally prove the fourth limit 



lim Pr[£ dcc2 |£ s c rc n£ c e 



0. 



71— >00 

To this end, we define event £o as 



(238) 



cos <(Y, Z') < v / l-2- 2 (- R '-- R -<V 2 ), VZ' G (B(M)\Z*) . 

(239) 

Recalling the decoding rule in (12141 i and the definition of 
event Edeci in ( 12 19k we see that when £ dccl and £2 occur 
simultaneously, then by condition ( 1215b the decoder finds the 
correct codeword Z = Z*. Therefore, 



Pr[£dcc2|£src>£enc] < 1~ P^dcd ^^2Psrc)^e 



and thus ( 1225) and the limit 



lim Pr[f 2 c |£ s c rc ,^ 







(240) 



(241) 



establish (1238) . 

We now prove ( 124Tb . For each m G {l, . . \2 n{ - R+ ^\ }, 
we index the vectors in the m-th bin from 1 to |S(m)| and 
we shall refer to the fc-th vector in this m-th bin by Z m ,k- Let 
K* be the index of Z*, i.e., Z m ,k* = Z*. By the symmetry 
of the code construction and the encoding rule, the probability 
Pr[£ c \£^ c ,£^ nc , M = m,K* = k] does not depend on the 
values m and k. We therefore, assume in the following that 
M = 1 and K* = 1. If we additionally condition on 
X = x and on cos<(X, Z*) = txz* > 0, the vectors 
Z12, ■ • ■ , Zx ,ib(i) (i- e -> me vectors in bin 1 that are not Z*) 
are independent and uniformly distributed over the centered 
n-sphere of radius \/na 2 z without the spherical cap of half- 
angle arccos(txz* ) centered at x. Thus, c ^ is an upper 
bound on the conditional density of the normalized vectors 

/ , Zi 2, • ■ • , - 



=Zi ,|B(i)i on the centered unit n-sphere. 

Applying Lemma [20] we therefore obtain Inequality (1243) 
shown on top of the next page. We note that for any 7 G [0, 1] 

2 C„(arccos(7)) 



< 1 - 



< 1 



and hence the mapping t \1 — 2C "(^ r( ^° s (T)) 
ing in t > 0. Therefore, since 

|B(1)| - 1 < 2 niR '- R - S) 



is decreas- 



(246) 



we further obtain (1244b . If now we take the expectation with 
respect to X, M, and K* (but keep the conditioning on £^ rc 
and £ c c nc ), (12441 results in 



Pr[£ 2 |£ s c rc) ^ 



< 



/ 2 C„ (arccos ^l~2^^f> 



i(R'-H-«) 



(247) 



The desired limit (124 U follows by ( 1247) and by Lemma [ 
In fact, applying Lemma [22] to 
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R' — R — 5 



(248) 



and to the function 



^ 2C7 n (ajccos(Vl-2-^'-fi-^))) 

C„(7r) 



we obtain that the right-hand side of ( 1247) tends to 1 as n 
tends to infinity because 



771 = - lim - log ( 

n— s-oo n \ 

= R' -R-8/2 



2 C*„(arccos(v'l - - R - s / 2 ))) 



(250) 
(251) 



Here, the equality holds by Lemma [2T] and because the factor 
2 in the logarithm does not change the limit, and the inequality 
holds by d248) and because S > 0. 

This concludes the proof of limit (1241) and thus of the 
fourth limit (|238) . Combining finally ( 12221 with d223l >- d225l > 
and ( 1238) establishes the proof of the lemma. ■ 

We can now bound the expected distortions of our scheme. 
We have 



and 



4 n) ( x .x«i) 



di n \± d ,± e ) 



= Pi[£ c ] E 
+ Pr[£]E 

= Prf£ c lE 



4 n) (X,X d )|£ c 



X,X d )|f 



(252) 



^(Xd.Xe)!^ 



Pr[£]E 4 n \± d ,± e )\£ 



The decoder-side distortion satisfies 



d d" ) ( x ' i d) = -||x- z 
n 



1 



6y|| 



*i|2 



^ 2 l|y|| 



(253) 

(254) 
(255) 



where the inequality holds by the Cauchy-Schwarz Inequality 
and because an arithmetic mean of two nonnegative numbers 
cannot be smaller than it's geometric mean. Therefore, 



(245) Pr r£l E 



4 n) (X,X d )|£ 



16 



lim Pr 

n— >oo 



, Z* x ) I < ey/l - 2-2(«'-«) ||y|| Jo* X = x, Y = y, cos <(X, Z*) = 



= 1 



lim Pr 

n— >oo 



cos<(Y,Z*)- v/i - 2- 2 («'-R) < 46V 7 ! - 2- 2 («'-«) 



££c> x = x > Y = y. cos <( X > Z *) = 



(236) 

= 1 
(237) 



|S(i)| 



Pr 



\J (cos <(Y, Zi jfc ) >y/l- 2-2(«'-«-*/2)) | X = x, M = 1, if* = l,£ s c rc ,£ c 



fe=2 



|B(1)| 



1- J| (l-Pr cos <(Y, Zi, fc ) > Vl - 2- 2 («'-- R -' 5 /2) | X = x, M = 1, K* = 1, £ B c rc , £ c cn 



k=2 



< 



/ 2Cn(arccos(Vl - 2-W- R ~ s / 2 ))) 



^ / 2C»(arccos(Vl - 2- 2 (^'--«-*V 2 ))) 

C„(tt) 



|B(1)|-1 



(242) 
(243) 
(244) 



<-Pr[£]E[||X|| 2 + | 

= -E[||X|| 2 + ||Z*|| 2 
n 

-§Pr[£ c ]E[||X!| 2 



Z*|| 2 + 6 2 ||Y|j 2 |£] 
+ 5 2 ||Y|| 2 ] 



\Z*\\ Z + b z \\Yr £ 



2 loci 



(256) 



(257) 



<3[a x +a 2 z + 6 2 (a| + 4)J 

-3{a%(l - c) + 4 + 6 2 (a| + <t 2 )(1 - e)) Pr[£ c ](258) 
< 3(a| + 0% + b 2 (o 2 x + 4)) (1 - (1 - e) Pr[£ c ] ) . (259) 

In the event £ c , we can derive a bound on the decoder-side 
distortion (x, x d ) that is tighter than (12551 : 



An) 



x,x d ) 



||x-as* -6y||- 



n 



| 2 + i!|z*|| 2 + ^||y|| 2 
n n 



2b 



2b 



(260) 



(261) 



^x,z ; [x,y) -\ [z ,y) 

n n n 

< (1 + e)a 2 x + a 2 + (1 + e)fe 2 (a 2 f + a 2 ) 
- 2(1 - e) 2 ao 2 x - 2(1 - tfbo\ 

+ 2(1 + e)(l + Ae)abo 2 x (262) 

< (1 + a 2 + b 2 - 2a - 2b + 2ab)o\ + a 2 o^ + b 2 o\, 
+ e(o x + b 2 (o x + afj) + 4ao x + 6bo x + 10abo x ) 
+ 8e 2 abo 2 x + 2e 3 bo 2 x (263) 

< D d 

+ e{a x + b 2 (o x + a 2 ,) + Aao 2 x + 8bo 2 x + 18abo x ) 

(264) 

where the first inequality follows from the definition of the 
event £ c , the second by throwing away some negative e-terms, 
and the third from Condition (12071 and because e < 1. Since 



Pr[£ c ] < 1, we thus have: 

Pr[£ c ]E[4")(X d ,X c )|r 

< D d + e(a x + b 2 (o x + afj) + Aao 2 x + 8bo x + \8abo 2 x ). 

(265) 

Combining (12321 . d259b , and (12651 . we obtain 

E[d d n) (X,X d )] (266) 

<D A + 'i(o 2 x +o 2 z + b 2 o^ (1 - (1 + e)Pr[£ c ] ) 

+e(a x + b 2 ((7 x + off) + 4ao x + 8bo\ + \8abo x ). 

(267) 

Similarly, we have for the encoder-side distortion: 

4")(x,x d )-i||6y-6x|| 2 

2 



n 
2. 



<-& 2 ||yH 2 + -& 2 ||x|| 2 

n n 

and thus, 

Pr[£]Ek")(X d ,X e )|£ 



(268) 
(269) 



< -E[& 2 ||Y|| 2 + 6 2 ||X|| 2 ] 



--Pi[£ c ]E 



b 2 \\Y\ 



6 2 ||X|| 



(270) 



< 2(b 2 (o 2 x + o 2 v ) + b 2 o 2 x ) (l - (1 - e) Pr[£ c ] ). (271) 

Moreover, in the event £ c we can derive a bound on the 
encoder-side distortion (x d , x e ) that is tighter than ( |269t : 

4")(x d ,x c )-i||6y-6x|| 2 (272) 

= i6 2 (||x|| 2 + ||y|| 2 -2(x,y)) (273) 

<(l + e)b 2 a 2 x + (l + e)b 2 (o x + o 2 ) 
- 26 2 (1 - tfo\ (274) 
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<b 2 a 2 v 



eb 2 



'x 



a 2 v ) + e 3 b 2 a 



x 



< D c + eb 2 (9cr x +cr, 



(275) 
(276) 



where the last inequality follows by Assumption (12081 l and 
because e < 1. Since Pr[£ c ] < 1, we thus have 



Pr[£ c ]EU")(X dl X e )|r 



< D c + eb 2 (9(7 2 x +<rl). (277) 



Combining finally ( 12531 ). dZTTT) . and j277l) . we obtain 



4")(X d ,X e ) 



< _D C + 2^b 2 (Ty t 
+eb 2 (9cr 2 x + a\ 



b 2 a\ 



(l-e)Prr]) 



(278) 



Recall that the rate of our scheme is smaller than R + 6 and 
that e, 5 > can be chosen arbitrarily close to 0. Therefore, 
from ( 12671 i. ( 1278b . and Lemma [24] we conclude that when 
a, dyy > and b > satisfy ( 1207b and ( 12081 ). then our scheme 
can achieve the triple 



'X U U 



'x 



u) u w 



,D A ,D f 



(279) 



This establishes Proposition! 



Appendix D 
The Cardinality Bound on U 



To prove the cardinality bound (11601 on U, we shall need 
the following variation on Caratheodory's theorem. 

Lemma 25: Any point on the boundary of the convex 
hull of a compact set in R d can be expressed as a convex 
combination of d or fewer points in the set. 

Proof: Let S be a compact subset of W 1 , and let x be a 
boundary point of its convex hull conv(6>). Since x is in the 
convex hull of S, it follows from Caratheodory's theorem that 
there exist d + 1 or fewer points 



Xi, . 



,x„e5, v < d+ 1 



and positive coefficients summing to 1 

V 

Ai,...,A„>0, $> 4 = 1 

such that 

V 

x = ^A 4 



i=l 



(280) 



(281) 



(282) 



We shall show that, in fact, of these v points, we can find d 
or fewer points whose convex combination is x. 

Since x is on the boundary of conv(5), there exists a 
hyperplane T~L that supports conv(6>) at x. Thus, 



•H = {£eR d :c T £ = c T x} 
for some vector c G R d and 



so 



x£conv(.S) 

c T x > c T x,-, i = 1, 



(283a) 

(283b) 
(284) 



We shall next show that the points xi , . 
that end we note that by (1282| > 



,x„ are in H. To 







At Xi 



i=l i=l 

= Xi (c T x - c T x 4 ) 

»=1 

where the second equality holds because the A's sum to 1 
( 12811 ). Since the A's are all positive, it follows from ( 1284t that 
all the terms on the RHS are nonnegative. Since they sum to 
zero, they must all be zero. And since the A's are positive, we 
conclude that 



i e {l,...,v} 



(285) 



and the vectors x^ are all in H. The vector x can thus be 
written as a convex combination of the v vectors in xi, . . . , ~x v 
in H. Since H is (d — 1) -dimensional, it follows from 
Caratheodory's theorem that x is in fact a convex combination 
of d or fewer of the vectors xi, . . . , x u . ■ 
The cardinality bound on U can now be proved as follows. 
Proof of the Cardinality Bound on U in Proposition \16\ 
Let the discrete random variables U and Z over the alphabets 
U and Z, the function <fi: y x Z — > Ad, and the function 
4>: XxZxU ^ X e satisfy (fl57l ) and (flBlfl) . We shall exhibit 
a random variable U over the alphabet 



and a function ip: X x Z x U ^ X e satisfying 

U-°-(X, Z)^>-Y 
and the K distortion constraints 



(286) 



(287) 



d k (X,<P(Y,Z),<P(X, Z,U)) 



ke{i,...,K}. 

(288) 

Since the Markov conditions ( 1157b and (1287b imply 



[u,z)^-x- 



-Y. 



(289) 



this will allow us to replace U and ip with U and ip and thus 
conclude the proof. 

To describe U and ip, we need some definitions. For each 
pair (x,z) E X x Z and each k G {1, . . . , K}, define 



D 



(x,z) 



Pr 



d k (X^(Y,Z),4,{X,Z,U)) (X,Z) = (x,z) 
= E[d k (x,<i>{Y,z)^{x,z,U))], (290) 

where the expectation is, by ( 1157b . with respect to 
Pu\xz{-\x, z) P Y \x{-\x). Define also the vector- valued func- 



tion 



h (x - z *>:U 



u i y 



I E [di (x, <j>(Y, z),ip(x, z, u))] 
\E [d K (x, <j>(Y, z),ip(x, z, u))] l 



(291) 
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where the expectation is with respect to P Y \x{'\ x )- Let S^ x ' z ^ 
denote the image of h^'^: 

s (x,z) A| seR K. s = h ix ' z) (u) for some uEU). (292) 
By definitions (|290b-(|292T> 

: g com(S (x < z) ) (293) 
\D ( k Z) ) 

and, consequently, there exists a point 



on the boundary of conv (£(*.*)) with 



(x,z) 



fce{i,. ..,#}. 



(294) 



Since SO"'*) is compact (it contains at most |A? e | points because 
h^ x,z \u) depends on u only via ip(x, z, u)), Lemma [25] 
implies that s( x ' 2 ) can be written as a convex combination 
of K or fewer points in S^ x ' z ^: 



(295) 



3=1 



where s^' z \ . . . , s^' G S^' 2 ) and the coefficients 
Ai,...Ak £ [0,1] sum to 1. Let u'f'^ 



u^' z) G W be 



preimages of , 



..s 



(x,z) 
K 



SO 



ft(*.*)(„(*.*))= s C*.*>, {1 K}. 



(296) 



We can now define the function ip as mapping every pair 
(x,z) <E X x Z and every j G {1, . . . , if} to 



${x,z,j) - ^{x,z,uf' z) ). 



(297) 



And we define the random variable [/ to be conditionally 
independent of Y given (X, Z) with the conditional law 



Pr 



U = j\X = x, Z = z 



_ y{x,z) 



je{l,...,K}. 

(298) 

The Markov condition ( 1287b thus holds by definition. More- 
over, (12901 1, d2"9"H and j294l - (E98l combine to prove that 
U and ?/> also satisfy the K distortion constraints in (1288) : 
denoting the fc-th component of the vector Sj by Sj^, for 
j,k€{l,...,K}, 

E [d k (x, <p(Y, z),$(x, z, U)) 

K 



= 1 
K 



= X^ E d k( x ,(f>(Y,z), ip(s 

3=1 
K 



(299) 
(300) 
(301) 



= s k 
< D 



(x,z) 



(302) 
(303) 



where the first equality holds by (12981 . the second equality 
by (I2971 i, the third equality by ( 1291b and (12961 l, the fourth 
equality by ( 1295b , and the inequality at the end by ( 12941 i. 
Finally, from (1303b we conclude that 



dk{x,(t>{Y,z),i>(x,z,U)) 



d k (X,<P(Y,Z),i>(X, Z, U)) 
= Pl i X = x,Z = z]E 

x£X,z£Z 

< ^ Pr[X = x,Z = z]D 

xex.zez 

< D k 



(x,z) 
k 



(304) 
(305) 

(306) 

(x z) 

where the last inequality follows from the definition of D) ' ' 
in ( 1290b and the fact that the tuple (U, Z, </>, ip) satisfies the 
original distortion constraints in ( 1158) . ■ 
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